Exploring UFC Fight Data: Trends, Techniques, and Geographic Spread

Introduction
Primary Visualization
analysis
Author

Rodrick Mpofu

Published

May 9, 2024

Introduction

Mixed Martial Arts (MMA) has seen a significant rise in popularity, with the Ultimate Fighting Championship (UFC) at the forefront. This report delves into UFC fight data, aiming to uncover patterns in fight results, fighter attributes, and event locations. The analysis will focus on trends in fight outcomes, techniques used by fighters, and the geographic spread of UFC events. The dataset was obtained from Kaggle and contains information on fighters, fights, and preprocessed data.

The data set contains the following columns:

  • R_ and B_ prefix signifies red and blue corner fighter stats respectively
  • _opp_ containing columns is the average of damage done by the opponent on the fighter
  • KD is number of knockdowns
  • SIG_STR is no. of significant strikes ‘landed of attempted’
  • SIG_STR_pct is significant strikes percentage
  • TOTAL_STR is total strikes ‘landed of attempted’
  • TD is no. of takedowns
  • TD_pct is takedown percentages
  • SUB_ATT is no. of submission attempts
  • PASS is no. times the guard was passed?
  • REV is the no. of Reversals landed
  • HEAD is no. of significant strinks to the head ‘landed of attempted’
  • BODY is no. of significant strikes to the body ‘landed of attempted’
  • CLINCH is no. of significant strikes in the clinch ‘landed of attempted’
  • GROUND is no. of significant strikes on the ground ‘landed of attempted’
  • win_by is method of win
  • last_round is last round of the fight (ex. if it was a KO in 1st, then this will be 1)
  • last_round_time is when the fight ended in the last round
  • Format is the format of the fight (3 rounds, 5 rounds etc.)
  • Referee is the name of the Ref
  • date is the date of the fight
  • location is the location in which the event took place
  • Fight_type is which weight class and whether it’s a title bout or not
  • Winner is the winner of the fight
  • Stance is the stance of the fighter (orthodox, southpaw, etc.)
  • Height_cms is the height in centimeter
  • Reach_cms is the reach of the fighter (arm span) in centimeter
  • Weight_lbs is the weight of the fighter in pounds (lbs)
  • age is the age of the fighter
  • title_bout Boolean value of whether it is title fight or not
  • weight_class is which weight class the fight is in (Bantamweight, heavyweight, Women’s flyweight, etc.)
  • no_of_rounds is the number of rounds the fight was scheduled for
  • current_lose_streak is the count of current concurrent losses of the fighter
  • current_win_streak is the count of current concurrent wins of the fighter
  • draw is the number of draws in the fighter’s ufc career
  • wins is the number of wins in the fighter’s ufc career
  • losses is the number of losses in the fighter’s ufc career
  • total_rounds_fought is the average of total rounds fought by the fighter
  • total_time_fought(seconds) is the count of total time spent fighting in seconds
  • total_title_bouts is the total number of title bouts taken part in by the fighter
  • win_by_Decision_Majority is the number of wins by majority judges decision in the fighter’s ufc career
  • win_by_Decision_Split is the number of wins by split judges decision in the fighter’s ufc career
  • win_by_Decision_Unanimous is the number of wins by unanimous judges decision in the fighter’s ufc career
  • win_by_KO/TKO is the number of wins by knockout in the fighter’s ufc career
  • win_by_Submission is the number of wins by submission in the fighter’s ufc career
  • win_by_TKO_Doctor_Stoppage is the number of wins by doctor stoppage in the fighter’s ufc career

The questions that will be explored in this report include:

  1. What are the trends of different fighter statistics over time?
  2. What are the most common ways fighters win?
  3. Where are UFC fights most common in the world or in the US?
  4. Can users to compare two fighters and see their statistics side by side to see who the better fighter is?

Primary Visualization

Fighter Attributes and Outcomes

raw_fighter_df |>
  group_by(Stance) |>
  summarise(Count = n()) |>
  arrange(desc(Count)) |>
  mutate(Stance = factor(Stance, levels = unique(Stance))) |>
  ggplot(aes(x = Stance, y= Count)) +
  geom_bar(stat = "identity", fill = "green2", color = "black") +
  theme_minimal(base_size = 15) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Fighting Stance", x = "Stance", y = "Count")

Key Insights

Fighters’ stances such as orthodox, southpaw, and switch show distinct trends in prevalence. Orthodox is the most common while the side ways stance is the least common.

Weight Class Popularity

data_df |>
  group_by(weight_class) |>
  summarise(Count = n()) |>
  arrange(desc(Count)) |>
  mutate(weight_class = factor(weight_class, levels = unique(weight_class))) |>
  ggplot(aes(x = weight_class, y = Count)) +
  geom_bar(stat = "identity", fill = "green2", color = "black") +
  theme_minimal(base_size = 15) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Weight Classes", x = "Weight Class", y = "Count")

Key Insights

The Lightweight, followed by Welterweight and Middleweight classes, are the most populated. Women’s featherweight, catch weight and open weight classes are less common, indicating possible areas for UFC to expand.

Shiny App

library(shiny)
library(shinythemes)
library(glue)
library(fmsb)

choices_R <- data_df |>
  pull(R_fighter) |>
  unique()

choices_B <- data_df |>
  pull(B_fighter) |>
  unique()

choices <- union(choices_R, choices_B)

weights <- data_df |>
  pull(weight_class) |>
  unique()

ui <- fluidPage(
  titlePanel("UFC Fighter Insights"),
  tabsetPanel(
    tabPanel("Fighter Statistics Over Time",
             theme = shinytheme("darkly"),
             sidebarLayout(
               sidebarPanel(
                 selectInput("selectedFighter", "Select a Fighter", choices = choices),
                 radioButtons("selectedWeight", "Select a Weight Class", choices = weights)
               ),
               mainPanel(
                 tabsetPanel(
                   tabPanel("Statistics Plot", plotOutput("statsPlot")),
                   tabPanel("Data Table", dataTableOutput("table"))
                 )
               )
             )
    ),
    tabPanel("Fighter Comparison",
             sidebarLayout(
               sidebarPanel(
                 selectInput("fighter1", "Fighter 1", choices = unique(data_df$R_fighter)),
                 selectInput("fighter2", "Fighter 2", choices = unique(data_df$B_fighter)),
                 actionButton("submit", "Submit")
               ),
               mainPanel(
                 plotOutput("plot"),
                 plotOutput("plot2")
               )
             )
    )
  )
)

server <- function(input, output, session) {
  # First app logic
  observeEvent(input$selectedWeight, {
    choices <- data_df |>
      filter(weight_class == input$selectedWeight) |>
      distinct(R_fighter, B_fighter) |>
      pull(R_fighter, B_fighter)
    
    updateSelectInput(session, inputId = "selectedFighter", choices = choices)
  })
  
  ufc_reactive <- reactive({
    fighter_data <- data_df |>
      filter(R_fighter == input$selectedFighter | B_fighter == input$selectedFighter) |>
      mutate(Date = as.Date(date)) |>
      select(Date, R_fighter, B_fighter,
             R_avg_SIG_STR_landed, B_avg_SIG_STR_landed,
             R_avg_TD_landed, B_avg_TD_landed,
             `R_total_time_fought(seconds)`, `B_total_time_fought(seconds)`)
    
    fighter_stats <- fighter_data |>
      mutate(SIG_STR_landed = if_else(R_fighter == input$selectedFighter,
                                      R_avg_SIG_STR_landed, B_avg_SIG_STR_landed),
             TD_landed = if_else(R_fighter == input$selectedFighter,
                                 R_avg_TD_landed, B_avg_TD_landed),
             Total_time_fought = if_else(
               R_fighter == input$selectedFighter,
               `R_total_time_fought(seconds)`, `B_total_time_fought(seconds)`)) |>
      mutate(across(c(SIG_STR_landed, TD_landed, Total_time_fought), ~ rescale(.x, to = c(0, 1))))
  })
  
  output$statsPlot <- renderPlot({
    ggplot(ufc_reactive(), aes(x = Date)) +
      geom_line(aes(y = SIG_STR_landed, colour = "Significant Strikes Landed")) +
      geom_line(aes(y = TD_landed, colour = "Takedowns Landed")) +
      geom_line(aes(y = Total_time_fought, colour = "Total Time Fought")) +
      labs(title = glue("Performance Over Time"), x = "Date", y = "Stat Value") +
      scale_color_manual(values = c("Significant Strikes Landed" = "blue", 
                                    "Takedowns Landed" = "red", "Total Time Fought" = "green")) +
      theme_minimal()
  })
  
  output$table <- renderDataTable({
    data_df |>
      filter(weight_class == input$selectedWeight)|>
      mutate(Date = as.Date(date)) |>
      select(Date, R_fighter, B_fighter,
             R_avg_SIG_STR_landed, B_avg_SIG_STR_landed,
             R_avg_TD_landed, B_avg_TD_landed,
             `R_total_time_fought(seconds)`, `B_total_time_fought(seconds)`)
  })
  
  # Second app logic
  
  fighter1 <- reactive({
    data_df |>
      filter(R_fighter == input$fighter1) |>
      select(R_fighter, R_avg_SIG_STR_landed, 
             R_avg_TD_landed, R_avg_SUB_ATT, R_avg_REV, 
             R_avg_SIG_STR_pct, R_avg_TD_pct) |>
      rename(fighter = R_fighter,
             avg_SIG_STR_landed = R_avg_SIG_STR_landed,
             avg_TD_landed = R_avg_TD_landed,
             avg_SUB_ATT = R_avg_SUB_ATT,
             avg_REV = R_avg_REV,
             avg_SIG_STR_pct = R_avg_SIG_STR_pct,
             avg_TD_pct = R_avg_TD_pct)
  })
  
  fighter2 <- reactive({
    data_df |>
      filter(B_fighter == input$fighter2) |>
      select(B_fighter, B_avg_SIG_STR_landed, 
             B_avg_TD_landed, B_avg_SUB_ATT, B_avg_REV, B_avg_SIG_STR_pct, B_avg_TD_pct) |>
      rename(fighter = B_fighter,
             avg_SIG_STR_landed = B_avg_SIG_STR_landed,
             avg_TD_landed = B_avg_TD_landed,
             avg_SUB_ATT = B_avg_SUB_ATT,
             avg_REV = B_avg_REV,
             avg_SIG_STR_pct = B_avg_SIG_STR_pct,
             avg_TD_pct = B_avg_TD_pct)
  })
  
  fighter_full_df <- reactive({
    
    fighter_full <- bind_rows(fighter1(), fighter2())
    
    fighter_full <- fighter_full |>
      drop_na() |>
      group_by(fighter) |>
      summarise(avg_SIG_STR_landed = mean(avg_SIG_STR_landed),
                avg_TD_landed = mean(avg_TD_landed),
                avg_SUB_ATT = mean(avg_SUB_ATT),
                avg_REV = mean(avg_REV),
                avg_SIG_STR_pct = mean(avg_SIG_STR_pct),
                avg_TD_pct = mean(avg_TD_pct))
    
    max_values <- fighter_full |>
      select(-fighter) |>
      summarise_all(max)
    
    min_values <- fighter_full |>
      select(-fighter) |>
      summarise_all(min)
    
    fighter_full <- bind_rows(max_values, min_values, fighter_full)
    
 
    fighter_comp <- fighter_full |>
      filter(fighter %in% c(input$fighter1, input$fighter2))
  
    max_values <- apply(fighter_comp[, 1:6], 2, max)  
    
    min_values <- apply(fighter_comp[, 1:6], 2, function(x) min(x) - (max(x) * 0.1))
    
    fighter_comp_2 <- fighter_comp |>
      select(-fighter)
    
    fighter_comp_df <- as.data.frame(fighter_comp_2)
    
    rownames(fighter_comp_df) <- c(input$fighter1, input$fighter2)
    
    radar_data <- rbind(max_values, min_values, fighter_comp_df)
    
    
  })
  
  output$plot2 <- renderPlot({
    if(input$submit == 0) {
      return("Please select two fighters and click submit")
    }
    
    # Set graphic colors
    library(RColorBrewer)
    coul <- brewer.pal(3, "Accent")
    colors_border <- coul
    library(scales)
    colors_in <- alpha(coul,0.3)
    
    
    
    radarchart(fighter_full_df(),
               axistype = 5, 
               #custom polygon
               pcol = colors_border, pfcol = 
                 colors_in, plwd = 4, plty = 1,
               #custom the grid
               cglcol = "grey", cglty = 1, axislabcol = "black", 
               caxislabels = seq(0, 20, 5), cglwd = 0.8,
               #custom labels
               vlcex = 0.8
    )
    # Add a legend
    legend("topright", 
           legend = rownames(fighter_full_df()[-c(1, 2),]),
           fill = colors_in,
           border = colors_border,
           bty = "n",
           cex = 0.8, 
           title = "Fighter")
  })
  
  output$plot <- renderPlot({
    
    if(input$submit == 0) {
      return("Please select two fighters and click submit")
    }
    
    fighter_full <- bind_rows(fighter1(), fighter2())
    
    fighter_full <- fighter_full |>
      drop_na() |>
      group_by(fighter) |>
      summarise(avg_SIG_STR_landed = mean(avg_SIG_STR_landed),
                avg_TD_landed = mean(avg_TD_landed),
                avg_SUB_ATT = mean(avg_SUB_ATT),
                avg_REV = mean(avg_REV),
                avg_SIG_STR_pct = mean(avg_SIG_STR_pct),
                avg_TD_pct = mean(avg_TD_pct))
    
    
    fighter_full |>
      filter(fighter %in% c(input$fighter1, input$fighter2)) |>
      pivot_longer(cols = -fighter, names_to = "stat", values_to = "value") |>
      ggplot(aes(x = fighter, y = value, fill = fighter)) +
      geom_segment(aes(xend = fighter, yend = 0), 
                   color = 
                     "black") +
      geom_point(stat = 
                   "identity",
                 position = position_dodge(width = 0.5), size = 3, aes(color = fighter)) +
      facet_wrap(~stat,
                 scales = "free_y") +
      labs(title = "Fighter Stats",
           x = "Stat",
           y = "Value") +
      theme_minimal() +
      theme(axis.text.x = 
              element_text(
                angle = 0, hjust = 0.5),  
            strip.background = 
              element_blank(),
            strip.text.x = 
              element_text(size = 10))
    
  })
  
}

shinyApp(ui, server)

Shiny applications not supported in static R Markdown documents

Static Look at the App

Key Insights

  • The app allows users to select a fighter from the dropdown menu to view how their statistics change over time.
  • The app allows users to compare two fighters based on their average stats in various categories through a lolipop chart.
  • The app uses a radar chart to visualize the comparison between the two fighters.
  • The radar chart provides a visual representation of the fighters’ stats in different categories, such as average significant strikes landed, average takedowns landed, average submission attempts, average reversals, average significant strike percentage, and average takedown percentage.
  • The radar chart highlights the strengths and weaknesses of each fighter, allowing users to compare their performance in different areas.

Conclusion

Conclusively, the world of UFC is vast and filled with talented fighters. The app provides a platform for users to explore the statistics of their favorite fighters and compare them with others. The app’s interactive features allow users to select fighters from the dropdown menu and view their stats over time. The lolipop chart and radar chart provide a visual representation of the fighters’ average stats in different categories, making it easier for users to compare their performance. The app is a valuable resource for UFC fans and enthusiasts who want to learn more about their favorite fighters and their performance in the ring. The sport is most prominent in the United States, but it has a global fan base that continues to grow. The app is a testament to the popularity of UFC and the interest in the sport worldwide.